A longitudinal causal graph analysis investigating modifiable risk factors and obesity in a European cohort of children and adolescents

Childhood obesity is a complex disorder that appears to be influenced by an interacting system of many factors. Taking this complexity into account, we aim to investigate the causal structure underlying childhood obesity. Our focus is on identifying potential early, direct or indirect, causes of obesity which may be promising targets for prevention strategies. Using a causal discovery algorithm, we estimate a cohort causal graph (CCG) over the life course from childhood to adolescence. We adapt a popular method, the so-called PC-algorithm, to deal with missing values by multiple imputation, with mixed discrete and continuous variables, and that takes background knowledge such as the time-structure of cohort data into account. The algorithm is then applied to learn the causal structure among 51 variables including obesity, early life factors, diet, lifestyle, insulin resistance, puberty stage and cultural background of 5112 children from the European IDEFICS/I.Family cohort across three waves (2007–2014). The robustness of the learned causal structure is addressed in a series of alternative and sensitivity analyses; in particular, we use bootstrap resamples to assess the stability of aspects of the learned CCG. Our results suggest some but only indirect possible causal paths from early modifiable risk factors, such as audio-visual media consumption and physical activity, to obesity (measured by age- and sex-adjusted BMI z-scores) 6 years later.


Covariates
We included variables reflecting eating behaviour, lifestyle, social, cultural and environmental factors that are assumed to be related to overweight and obesity across the early life course.A detailed description of all measurements and their units used in our analysis is provided in Table 1 and in the supplement.Some of these variables are time-invariant and would not be targeted by any intervention programme in later childhood, such as region of residence or migration background.Other time-invariant variables might impact a child's development during pregnancy and as an infant, such as mother's age at birth or breastfeeding duration; we will refer to these as early life factors.All other variables are time-varying and were measured repeatedly.Age-and sex-specific BMI z-scores (BMI) for children and adolescents were calculated according to the extended IOTF criteria 16 ; for simplicity we refer to these as BMI.Mother's BMI was assessed at each survey in kg/m 2 .The homeostatic model assessment (HOMA-IR, short HOMA) index 17 served as a marker for insulin resistance.The diet of the child was measured by a validated FFQ 18 and was classified by an adapted version of the Youth Healthy Eating Index (YHEI) 19 .The YHEI assesses the consumption frequencies of both healthy and unhealthy food as well as eating behaviours, where a higher score indicates a healthier diet 20 .PA was measured by questionnaire, and an audio-visual media consumption score (AVM) was used as proxy for sedentary behaviour.Total sleep duration including nocturnal sleep was estimated based on 24-h dietary recall data at baseline 21 and quantified by selfreports at the two follow-ups.Multiple dimensions of psychosocial well-being were assessed by questionnaire which was developed for parents' response on behalf of children and adolescents 22 .Children above the age of 12 completed the questionnaires for themselves.Further details on the study population and used covariates are given in the supplement.

Statistical analysis
For our analyses, only children who participated in all three surveys were considered.Multiple imputation (MI) was applied to avoid loss of study subjects and to reduce potential bias due to missing values 23 ; specifically we used tenfold imputation with random forests as implemented in the R-package mice 24 .MI assumes that values were missing at random (MAR).To strengthen the plausibility of the MAR assumption, the imputation models were fitted on a larger dataset containing additional variables that contribute to the various scores such as AVM or well-being 23 .
To estimate the cohort causal graph (CCG), we applied a method of causal discovery known as PC-algorithm 25,26 .The algorithm outputs empirically plausible causal directed acyclic graphs (causal DAGs) suggesting direct and indirect causal relations, as shown by directed edges or directed paths.We chose this particular algorithm because other, especially likelihood-based approaches typically make more implicit or explicit distributional assumptions which would seem highly implausible for the given cohort data.While the PC-algorithm also makes assumptions, there is some more robustness of our approach, e.g. in the context of multiple imputation 27 .As a DAG represents certain conditional (in)dependencies between variables 28 , the PC-algorithm proceeds by investigating conditional independencies in the data using statistical tests, and then determines all DAGs that agree with these independencies.The result is not unique since different DAGs can represent the same conditional independencies, i.e. certain causal structures are indistinguishable.Instead, the algorithm outputs the equivalence class of all DAGs that represent the detected conditional independencies.This class is represented by a so-called completed partially directed acyclic graph (CPDAG) 29 containing directed and undirected edges, where an undirected edge means that both causal directions occur in the equivalence class.The validity of the PCalgorithm relies on the assumptions of causal sufficiency, i.e. absence of latent confounding, and of faithfulness, under which the PC-algorithm consistently selects the true CPDAG 25 .Of note, while the causal interpretation of directed edges or paths in the output of causal discovery algorithms relies on causal sufficiency, which may often be implausible, the absence of such edges and paths can still be interpreted as absence of causal relations even without causal sufficiency.
The PC-algorithm had to be modified for application to multiply imputed cohort data 11,27,30 .Further, to account for the cohort structure we used the tiered PC-algorithm tPC 31 .This was then combined with functions from micd 32 to deal with multiply imputed data containing a mix of categorical and continuous variables.The R packages micd and tPC are both extensions of pcalg 33 .The tPC-algorithm outputs a maximally oriented partially directed acyclic graph (MPDAG), which is similar to a CPDAG but can contain more directed edges due to background knowledge 13,34 .tPC determines an MPDAG under the restriction that edges are prohibited from pointing backwards in time which also reduces the number of required statistical tests for conditional independencies.In our analysis we pre-specified the following ordering: region, sex and migration → ISCED and income at baseline → all early life factors → baseline variables → ISCED and income at FU1 → remaining FU1 variables → ISCED and income at FU2 → remaining FU2 variables.Additionally, specific orientations between certain pairs of variables were prohibited, for example from breastfeeding to birth weight.We carried out a number of alternative and sensitivity analyses to check the robustness of the estimated MPDAG against specific analytical choices: (a) while the main analysis used a nominal level of 0.05 for the conditional independence tests, we compared this with a nominal level of 0.1 (MI-0.1);(b) using test-wise deletion (TWD) instead of MI and (c) applying a different, likelihood-based, causal discovery algorithm which uses the EM algorithm for missing values 35 .Moreover, to assess the general stability of the output we drew 100 bootstrap samples from the analysis data, applied to each a single random forest imputation using the same imputation model as in the main analysis, and then estimated 100 bootstrap graphs (BGs).Thus, we can take the frequencies of interesting causal structures in the bootstrap samples as indication of their stability, e.g.specific edges (direct causal links) or indirect links via (partially) directed paths between exposures and outcome.In a directed path, all edges between two nodes are directed, while in a partially directed path, at least one edge between two nodes is undirected.More background on causal graphs and other graph characteristics are described in the supplement.

Study sample
The study sample included 5,112 children who participated in all three surveys.Table 2 shows that children were on average aged 5.9 years at baseline and 11.7 years at FU2.At baseline, 12.6% of the children have overweight and 6.7% suffer from obesity.BMI z-scores increased on average by approx.0.2 standard deviations (SD) over the years (0.32 to 0.55).The overall number of missing values was 15% with some variables exhibiting very large numbers of missings such as PA at FU2 (50.1%) (see Figure S1 and Table S1 characteristics after imputation).Diagnostic plots of the multiply imputed data were satisfactory (see Figure S2).

Cohort causal graph
The CCG resulting from our main analysis is shown in Fig. 1 (see also https:// bips-hb.github.io/ ccg-child hoodobesi ty for an interactive graph).Overall the graph had 104 edges linking 51 variables, of which 12 could not be oriented.Focusing on BMI as outcome, there were direct links from region, familial educational level, birthweight and mother's BMI (B) to BMI (B); in contrast, there were no paths from any of the modifiable risk factors to BMI (B).However, all of these modifiable baseline factors (sleep, AVM, YHEI, PA, well-being) were possible ancestors and hence possible causes of BMI in both follow-ups (cf.Table 3), i.e. they had partially directed paths to BMI.These included paths from all five modifiable baseline risk factors to BMI six years later.For instance, there were five partially directed paths from YHEI (B) to BMI (FU2) (Fig. 2).Almost all paths between exposures and BMI (FU2) went through AVM (FU1) and HOMA (FU1, FU2), many also through well-being (FU1) and some through YHEI (B).In the CCG we also see that the exposures themselves were moderately interconnected within the same tier and across time, with many orientations of edges among the exposures at FU1 being undecidable.Note also that most repeated measurements were linked by edges with the exception of BMI.

Bootstrap analysis
We assessed the stability of selected features of the main CCG based on 100 BGs.Of the 104 edges in the main CCG, 36 were found in more than 80% of BGs, with a further six edges in more than 70% of BGs (see Table S2).
Of these edges, 16 were between repeated measures of the same variable, e.g.HOMA.FU1-HOMA.FU2, and 13 emanated from modifiable risk factors.In contrast, 50 edges occurred in 50% or fewer of the BGs.The presence of any paths from exposures to BMI was rather stable.Specifically, we considered directed or partially directed paths from baseline modifiable exposures to later BMI (FU2) (see Table 4).The most frequent were paths from YHEI to BMI (84% of BGs), while paths from sleep duration to BMI were in 75% of the BGs; paths from the other three baseline exposures (well-being, AVM, PA) to BMI occurred in 80% of the BGs.There were mostly multiple causal paths found between an exposure and the outcome.For instance, the median number of different (partially) directed paths from AVM (B) to BMI (FU2) found in each BG was 20.No BGs ever contained a direct edge from a baseline modifiable exposure to BMI at FU2. Table 5 shows patterns between repeated measurements in the main CCG and the BGs.It can be seen for BMI that in 95 BGs the paths B → FU1 → FU2 or B → FU1 → FU2 ← B were found despite not being contained in the main CCG.The BGs contained on average 22 edges more than the CCG in the main analysis.For comparison with this main CCG, we constructed a graph containing the same number of edges based on the most frequent edges; this resulted in the inclusion of all edges that occurred in more than 44 of BGs (see Fig. S6).The (structural) Hamming distance between main CCG and BG44 was 56 (73), indicating that about half of the edges between the two graphs are the same.

Sensitivity analyses
Using a larger nominal significance level of 10% (CCG MI-0.1) essentially confirmed the core results from the main graph with only few more edges (Table 6, Fig. S3).The CCGs estimated with two alternative methods for missing values (TWD and EM) were with 40 to 50% more edges less sparse than the main graph (cf.Figs.S4,  S5), where only 20% of the edges in the main analysis were also found in the TWD graph.This was also reflected by the Hamming distances, which was large with 205 for TWD compared to the main CCG.The structural

Discussion
The estimated CCG suggested rather sparse causal relationships between various variables around childhood obesity, with dependencies of repeated measures across time being the strongest and most stable as one might expect.All the individually modifiable risk factors diet, PA, sleep duration, subjective well-being and audio-visual media consumption at baseline were stably found to be possible indirect, but not direct, causes of BMI 6 years later, mostly via the HOMA index which was closely linked to BMI.Associations between media exposure [36][37][38][39] , sleep [40][41][42] , PA 40 , diet 40 , well-being 41 and insulin resistance measured by HOMA were previously found by others and in the IDEFICS/I.Family cohort, partly in smaller subsets and using different variables such as objective accelerometer-based measurements of PA [43][44][45] .Insulin resistance is strongly associated with obesity, which is reflected by an undirected edge in the CCG.Excess adipose tissue is a known risk factor for insulin resistance; however, normal-weight children may also be affected 46 .From the early life factors, birthweight was a (possible) ancestor of BMI (B, FU1, FU2) and formula milk feeding for BMI (FU1, FU2).High birth weight is known to be associated with childhood obesity 47 ; and a recent systematic review describes that there is moderate evidence that breast milk consumption reduces the risk of overweight and obesity at age 2 years and older 48 .Overall, our results suggested that early life interventions targeting health behaviours of the child will have some, but only indirect effects on BMI 49 .Hence, cultural, perinatal and familial variables are potentially more immediate causal influences on obesity.Based on the selected CCG, we might therefore hypothesise that early life interventions alone may be insufficient to prevent childhood obesity.Indeed, Börnhorst et al. 49 found that even sustained (over 13 years) and joint hypothetical interventions on multiple modifiable behaviours only reduced the risk of obesity in children from 31 to 25%.Thus, our finding is compatible with the view that the causal structure governing childhood health behaviours and outcomes should be considered from a complex adaptive system's perspective [50][51][52] .Lee et al. 50emphasize that obesity is shaped by multiple factors which act at different scales such as individual behaviour and physiology, but also genetics, social dynamics, the built environment, and societal forces.As a way forward, Maitland et al. 53 , for example, describe the practical implementation of a "whole of systems" approach.
Using sensitivity analyses we investigated the robustness of the CCG regarding the handling of missing values and used bootstrap samples to assess the stability of learned graph structures.The method for handling missing values is not negligible as more complex and quite different graphs were estimated using TWD or the EM-algorithm instead of MI.Moreover, it was noticeable that the TWD graph, unlike the CCG, was not able to detect edges between repeated measurements.Witte et al. 27 showed that TWD can fail in recovering certain causal structures regardless of the underlying missingness mechanism (MCAR, MAR or MNAR).Further, MI was usually more efficient than TWD, although datasets including variables with mixed measurement scales were more problematic.
We used bootstrap resamples to account for the uncertainty in the selection of the CCG [54][55][56] .In interpreting the results, it has to be kept in mind that the BGs tended to have more edges than the main CCG, due to spurious dependences induced by sampling with replacement from the given data 56,57 .We therefore considered the BGs purely as a measure of the stability rather than, say, for estimating edge probabilities.Thus, edge and path  www.nature.com/scientificreports/frequencies indicate the stability of presence and absence of certain graph structures.While about a third of the learned edges in the main analysis were quite stable, we also found that half of the edges were rather unstable.Similarly, we found that the existence of some paths from early modifiable risk factors to later BMI was quite stable, but the actual paths themselves were very variable, i.e. a particular path may not be selected in more than 20% of BGs.In contrast, the absence of direct links from early modifiable risk factors to later BMI was very stable as these occurred in no BGs.This can be interpreted as the absence of direct causal influences even when the assumption of causal sufficiency is violated.
The main analysis was able to find the expected paths for repeated measurements of HOMA and all modifiable risk factors, but not for BMI, and only partly for daily family meals and mother's BMI.The BGs runs revealed that missing edges between the repeated measurements of BMI are very rare.The CCG is therefore difficult to explain in this respect.In contrast, the learned CCG suggests the plausible relationship that BMI is conditionally independent of modifiable risk factors given the child's insulin resistance status (HOMA).The instabilities that we found through the bootstrap analysis might partly be explained by the rather low sample size for the perhaps rather weak associations, the extra uncertainty due to the high proportion of missing values, and the large intervals between follow-ups.Especially the confidence in specific paths might be rather low which is critical.A greater stability would, for instance, be desirable for subsequent analyses that use a learned causal graph to determine adjustment sets to estimate causal effects 8 .Some graphical rules for identifying adjustment sets just take the adjacent nodes of the exposure into account and others require also the mediators between exposure and outcome, for which reliable knowledge on causal paths is required 58,59 .
Recently, Peterson, Osler & Ekstrom 14 also proposed an extension of the PC-algorithm to include temporal information for inferring a graph from observational data.However, our extensions of the PC-algorithm allows the first application of causal discovery to real-world cohort data accounting jointly for missing values, mixed discrete and continuous variables, and background knowledge such as time-ordering.The required theory and software have only recently been developed 11,27 .
The IDEFICS/I.Family cohort provides a rich source of phenotypes capturing different dimensions of dietary and lifestyle related health aspects repeatedly measured over the early life course.However, a challenge was the choice of variables included in the analysis; these needed to be sufficiently different (i.e.not measuring the same underlying construct) to find meaningful dependencies between the different dimensions of obesity.The further sensitivity analyses (see web page) showed that different choices yielded slightly different selected CCGs, but the overall message remained the same: adolescents' BMI was not directly affected by earlier behavioural variables, but had indirect, potentially causal, links through AVM (FU1) and HOMA (FU1, FU2).
Further general sources of bias with observational data could also affect our results, such as reporting or selection bias.However, all participating countries adhered to a harmonised protocol and to quality control procedures ensuring high data quality.

Figure 1 .
Figure 1.Causal graph of childhood obesity based on N = 5112 European children and adolescents born between 1997 and 2006 estimated by the tiered PC-algorithm for multiple imputed datasets.The nodes colours correspond to the different stages of the life course.Edges without arrowheads could not be orientated by the algorithm.An overlap of nodes and edges was unavoidable.We advise to look at the interactive graphs here: https:// bips-hb.github.io/ ccg-child hood-obesi ty/.AVM audio-visual media consumption, B baseline, FU1 first follow-up, FU2 second follow-up, HH diet: month when the child was introduced into the household's diet, HOMA homeostatic model assessment-insulin resistance, ISCED highest parental education (International Standard Classification of Education), PA physical activity, YHEI youth healthy eating index, zBMI body mass index z-score.

Figure 2 . 4 3 26 Continued
Figure 2. All five possible causal paths between the Youth Healthy Eating Index (YHEI) at baseline and zBMI at the second follow-up (AVM audio-visual media consumption, PA physical activity).AVM audio-visual media consumption, B baseline, FU1 first follow-up, FU2 second follow-up, HH diet month when the child was introduced into the household's diet, HOMA homeostatic model assessment-insulin resistance, ISCED highest parental education (International Standard Classification of Education), PA physical activity, YHEI youth healthy eating index, zBMI body mass index z-score.

Table 1 .
Variables used in the analysis with units and further explanations.Background knowledge was used to order them into different tiers.Units of continuous variables are given in italics.B baseline, FU1 first follow-up, FU2 second follow-up.

Table 2 .
Characteristics of children in the IDEFICS/I.Family cohort participating in all three surveys from 2007 to 2014. a n (%); mean (standard deviation).FU1 first follow-up, FU2 second follow-up, BMI body mass index, ISCED highest parental education (International Standard Classification of Education).

Table 3 .
Possible ancestors of BMI at baseline, first and second follow up.f.p.: Path between a pair of vertices was forbidden a priori (e.g.due to time constraints).AVM audio-visual media consumption, BMI body mass index, B baseline, C context variables, ELF early life factors, FU1 first follow-up, FU2 second follow-up, HOMA homeostatic model assessment-insulin resistance, ISCED highest parental education (International Standard Classification of Education), PA physical activity, sleep nocturnal sleep, YHEI youth healthy eating index.The resulting CCG suggested that cultural, perinatal and familial factors and insulin resistance (HOMA-IR) potentially played a more immediate causal role than individually modifiable risk factors which had stable but only indirect relations with adolescents' BMI.

Table 4 .
Directed and partially directed paths between modifiable risk factors at baseline and BMI 6 years later in the original CCG and in 100 Bootstrap graphs (BGs).AVM audio-visual media consumption, B baseline, BMI body mass index, FU1 first follow-up, FU2 second follow-up, HOMA homeostatic model assessment-insulin resistance, ISCED highest parental education (International Standard Classification of Education), PA physical activity, Sleep nocturnal sleep, YHEI youth healthy eating index.

Table 5 .
Path patterns between repeated measurements in CCG and Bootstrap graphs.AVM audio-visual media consumption, B baseline, BMI body mass index, FU1 first follow-up, FU2 second follow-up, HOMA homeostatic model assessment-insulin resistance, ISCED highest parental education (International Standard Classification of Education), PA physical activity, sleep nocturnal sleep, YHEI youth healthy eating index.Bold numbers: path included in main CCG.

Table 6 .
54aracteristics of the discovered graph without singletons.Avg.BG occurs on average in each BootG, BGx summarized bootstrap graph with edges that occurred at least × times in 100 bootstrap replications, EM structural EM algorithm, main multiple imputation with nominal level of 0.05, MI-0.1 multiple imputation with nominal level of 0.1, MEU mean edge uncertainty54, TWD test-wise deletion.